Decomposition of fissile isotope antineutrino spectra using convolutional neural network 
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Recent reactor antineutrino experiments have observed that the neutrino spectrum changes with the reactor 
core evolution and that the individual fissile isotope antineutrino spectra can be decomposed from the evolving 
data, providing valuable information for the reactor model and data inconsistent problems. We propose a ma- 
chine learning method by building a convolutional neural network based on a virtual experiment with a typical 
short-baseline reactor antineutrino experiment configuration: by utilizing the reactor evolution information, the 
major fissile isotope spectra are correctly extracted, and the uncertainties are evaluated using the Monte Carlo 
method. Validation tests show that the method is unbiased and introduces tiny extra uncertainties. 


Keywords: Reactor antineutrino, Isotope antineutrino spectrum decomposition, Convolutional neural network 


I. INTRODUCTION 


Significant deviations between the Huber-Mueller model 
and experimental isotope antineutrino spectra have been 
demonstrated, causing a ~6% deficit in the reactor an- 
tineutrino flux (the so-called Reactor Antineutrino Anomaly, 
RAA) and an excess of reconstructed positron signal events in 
the 4-6 MeV (the so-called 5-MeV bump) [1—6]. Determining 
the origin of the reactor antineutrino rate and shape anomaly 
is critical, especially for understanding nuclear physics and 
improving nuclear databases for fundamental and application 
research. Relevant experimental and theoretical efforts have 
been made to solve the aforementioned problem, including 
attempting to determine the individual isotope contributions 
of reactor De, which has provoked further investigations. In 
2017, the Daya Bay experiment revealed a 7.8% discrepancy 
between the observed and predicted 7°°U yields by using the 
span of effective 2°°Pu fission fractions, which may be the 
primary contributor to the RAA [7]. In 2019, the PROSPECT 
experiment measured the ?35U spectrum from the highly en- 
riched uranium of the High Flux Isotope Reactor, and the 
235U spectrum shape was consistent with a deviation rela- 
tive to the prediction made by the Daya Bay experiment in 
the energy region of 5-7 MeV [8]. In the same year, the the- 
oretical result of the summation method was compared with 
that of the Daya Bay experiment without any renormalization, 
which reduced the flux discrepancy to 1.9% by inducing the 
correction of the pandemonium effect [9]. Also in 2019, the 
Daya Bay experiment first extracted the ?35U and ?39Pu neu- 
trino spectrum from commercial reactors by using the reactor 
evolution information [10]. 

Determining individual isotope antineutrino spectra can 
also play an important role in nuclear safeguards. The Inter- 
national Atomic Energy Agency (IAEA) cooperates with neu- 
trino physicists to develop new approaches for reactor moni- 
toring methods by observing the De emitted from the reactor, 
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where the isotope antineutrino spectra are key inputs for the 
reactor monitoring applications [11] because the reactor an- 
tineutrino flux and spectra are sensitive to the changes of the 
fuel content in the reactor core and can be observed via a suit- 
able antineutrino detector. The applied neutrino physics com- 
munity also explored the reactor antineutrinos as a tool for 
reactor monitoring and concluded that improving the knowl- 
edge of the reactor antineutrino flux and spectrum is required 
for reactor safeguards applications [12]. The DOE National 
Nuclear Security Administration (NNSA) Office of Defense 
Nuclear Nonproliferation Research and Development (DNN 
R&D) organized a group of neutrino physicists and nuclear 
engineers to find practical roles of neutrino technology in nu- 
clear energy and security; the final report, called Nu Tools, 
asserted that it is possible to exploit the neutrino spectrum 
to determine the fissile material content of the reactor with 
high reactor antineutrino rates [13]. The isotope antineutrino 
spectra decomposed directly from reactor antineutrino exper- 
iments have no RAA or spectrum distortion problem while 
having comparable or better uncertainties than those in the 
Huber-Mueller model, providing more reliable data inputs for 
the nuclear safeguards. 

Only the Daya Bay experiment has published the reactor 
isotope antineutrino spectra by using two methods, the min- 
imum y? and Markov Chain Monte Carlo (MCMC) meth- 
ods, and has obtained consistent results. The minimum x? 
method is a statistical inference method that minimizes the 
x? statistic, which is constructed in the form of a x? func- 
tion. The y? function y?(@) is an estimator for the param- 
eter O and composed of a likelihood function comparing the 
binned observation data n = (nı,... ny), the expectation 
(8) = (m(0),..., 4 (@)), and the penalty term for con- 
straining the parameters: 
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where n; follows a Poisson distribution, and f(e, ©) is the 
penalty term that constrains the nuisance parameter e with the 


j=1 


correlations © of the nuisance parameters. The minimum y? 
method naturally introduces the statistical uncertainty and the 
systematic uncertainties into the estimator, and the best fit pa- 
rameters and the corresponding uncertainties can be obtained 
by minimizing the x? function. The minimum y? method is 
a robust, traditional frequency fitting method commonly used 
in high energy physics. The second method used in the Daya 
Bay decomposition research is the MCMC method based on 
Bayesian inference. In Bayesian theory, all knowledge of the 
parameter 0 is summarized in the posteriori probability den- 
sity function (p.d.f.) p(@|D): 


p(@|D) x P(D|@)x(9), (2) 


where D is the data, @ is the parameter, P(D|@) is the likeli- 
hood function, and 7() is the priori p.d.f. of 0. By perform- 
ing statistical calculations on the posteriori p.d.f., the mean 
values and uncertainties can be extracted. Usually, calculat- 
ing the posteriori p.d.f. is difficult, especially for high dimen- 
sional problems. Instead, the MCMC method is used to sam- 
ple the posteriori p.d.f., and the mean values and uncertainties 
can be obtained by performing calculations on the samples. 
In the Daya Bay experiment, the measured data were divided 
into 20 groups of inverse beta-decay (IBD) spectra, corre- 
sponding to different burning stages of a reactor cycle. The 
prediction spectra for the 20 groups were obtained by consid- 
ering the detector and reactor model combined with the reac- 
tor information. Data and prediction were used to construct 
the likelihood function in the minimum y? method and the 
Bayesian inference method. The uncertainties from the detec- 
tors and the reactors were incorporated into the penalty terms 
in the minimum y? method and priori p.d.f. in the Bayesian 
inference method, respectively. Eventually, the results of the 
decomposed isotope spectra are consistent by using the two 
methods. 

The extraction of isotope antineutrino spectra has been 
studied in reactor neutrino physics, and there is no convinc- 
ing answer to RAA; nevertheless, we consider it beneficial to 
study the applications of new methods. Here, we propose a 
new method that uses a convolutional neural network (CNN) 
to decompose primary fissile isotope antineutrino spectra by 
fitting the weekly detected antineutrino spectrum as a func- 
tion of the individual isotope fission fractions. A CNN is a 
network model for machine learning, which provides an opti- 
mal architecture for detecting key features in images and time 
series data. It has broad applications in, for example, com- 
puter vision and natural language processing [14-17]. And 
it has been used in certain physics research fields to extract 
information from experimental data and fit the model param- 
eters [18]. Notably, the established decomposition methods, 
such as the minimum y? and MCMC methods, are offline al- 
gorithms. Thus, the analysis results must be updated from 
scratch as new data arrives, which is a waste of time, espe- 
cially for long-term experiments. Second, the minimum y? 
method and the MCMC method have to load the entire dataset 
into the computer memory, requireing a large amount of com- 
puter memory when dealing with big data, for example, with 
many reactor burning cycles and detailed reactor information, 
making these methods unusable. Moreover, they usually re- 


sample from the original data to reduce the size of the dataset. 
However, the processing may introduce the loss of informa- 
tion and bias in the analysis. By contrast, the CNN approach 
is an online algorithm [19]. The advantage of online updat- 
ing is that analysis can be performed without access to the 
historical data; thus, overcoming the storage and computa- 
tion limitations is possible in some cases. In addition, the 
proposed method makes full use of the data without causing 
excessive information loss. This provides an additional ma- 
chine learning technology for the decomposition of reactor 
fissile isotope spectra and can be used for neutrino spectrum 
analysis in future reactor antineutrino experiments. 


Il. SETUP OF THE VIRTUAL EXPERIMENT 


In this section, we describe a virtual reactor antineutrino 
experiment to produce the simulation dataset for the proposed 
CNN method for training and testing. 

Suppose there is a virtual experiment with a one-reactor 
one-detector layout, where the reactor is a type of pressur- 
ized water reactor (PWR) and the sole source of the De flux. 
Antineutrinos are produced from thousands of beta-decay 
branches of the fission products from four major fissile iso- 
topes, ?35U, 23°U, 239Pu, and 74!Pu, in the reactor core. A 
virtual 20 ton liquid scintillator antineutrino detector with a 
50 m baseline from the reactor is set up using the parameters 
in TABLE 1. The antineutrino is detected via IBD reactions 
in the detector: De + p > e+ +n. The predicted De spectrum 
at a given time t is calculated as 
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where E, is the De energy, Np is the target proton num- 
ber, € is the detection efficiency, o(E,) is the inverse beta- 
decay cross section, L is the distance from reactor to detec- 
tor, Psur(E,, L) is the De survival probability, W(t) is the 
thermal power of the reactor, e; is the energy released per fis- 
sion for isotope i, f; is the fission fraction, and $;(E,,) is the 
De energy spectrum of fissile isotope 7 per fission. 


TABLE 1. Parameter list of the virtual experiment. 


Parameter Value Uncertainty 
Thermal power, W 2.9 GW 0.5% 
Fission fraction, f; Ref. [20] 5% 

Energy/fission, e; Ref. [21] 0.2% 

Detection efficiency, € 80.25% 1.5% 

Target protons, Np 1.43 x 10° 0.92% 
Baseline, L 50 m negligible 


For the virtual experiment, the isotope antineutrino spectra 
S(E,) are assumed to be the same as those in the Huber- 
Mueller model, denoted by S/™(E,). Using the configu- 
rations of the Daya Bay experiment as a reference, the ex- 


perimental parameter values in Eq. 
TABLE 1. 

In addition to TABLE 1, the fission fraction evolution of a 
fuel cycle is presented in the top panel of Fig. 1, where the 
fission fractions of the four major fissile isotopes are shown 
as a function of the burn-up. For PWR, the reactor core usu- 
ally consists of three batches of fuel assemblies with different 
ages, and usually, one-third of the old batches are replaced by 
fresh fuel at the end of a refueling cycle. During the reactor 
burning time, the fissile isotopes are mainly depleted by fis- 
sion, decay, and neutron capture processes. Some of them, 
such as plutonium isotopes, are also generated by the neutron 
captures and decays from the mother nuclei in the reaction 
chains. The depletion and generation of the fissile isotopes 
are essential for the evolution of the reactor fuels. 


(3) are presented in 


100, 35 
[C a Si) 
s 80% 4 38y 
= F py 
3 60- o Ipy 
i k 
5 40- 
D B 
„2 F 
i 203 
oe 


~~ 5000 10000 15000 


Burn-up (MWD/TU) 


8000 
6000 
4000 
2000 


100 200 300 400 500 600 
Time (week) 


20000 


E, (MeV) 


Fig. 1. (Color online) (Top panel) Isotope fission contribution sta- 
tus in one fuel cycle. The fission fraction summations of four major 
isotopes are normalized to 1. Data are extracted from Ref. [20]. 
(Bottom panel) Weekly De event rates during the entire experiment 
operation. The color represents the observed De event rates. The op- 
eration comprises several fuel cycles, among which each fuel cycle 
is similar to that in the top panel. De event rates vary with the op- 
erating time because the fission fractions of fuel components differ. 
Thus, the observed antineutrino spectrum is a function of time and 
fission fractions. 


Burn-up in the top panel of Fig. 1 is defined as 
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burn-up = 


(4) 


where W is the average power of the fuel element, D is the 
number of days since the fuel element begins to burn in the 
core, and May is the initial uranium mass of the fuel element. 
In this study, My is supposed to be 72 tons. 

The uncertainties of the fission fractions of the four major 
fissile isotopes are assumed to be 5%, as in the Daya Bay 
experiment, and the correlation matrix of the uncertainties is 
from Ref. [20], which was extracted from the simulations of 
a typical PWR. The energies released per fission are from Ref. 
[21]. All the uncertainties are assumed to be time-correlated 
in this study. 

Due to the fuel evolution of the four major fissile isotopes, 
the De emitted from the reactor core changes as a function of 
time. The bottom panel of Fig. 1 shows the reactor antineu- 
trino spectrum evolution of nine reactor fuel cycles over 657 
weeks. These spectra are treated as measurement data from 
the virtual experimental antineutrino detector, which contain 
information on the reactor evolution. The individual fissile 
isotope antineutrino spectra are decomposed from these ob- 
served spectra by utilizing the reactor information listed in 
TABLE 1, which uses typical values similar to those in the 
Daya Bay experiment, and in a real reactor antineutrino ex- 
periment is provided by the nuclear power plant. 

Notably, the IBD cross section o(£,,) and the isotope an- 
tineutrino spectrum S;(£,,) are coupled with antineutrino en- 
ergy in Eq. (3). The IBD yield per fission from individual 
isotopes could be defined as 

o;i(E,) = 0(F,)-Si(E,) i= (235, 238, 239,241), (5) 
which is the isotope spectrum to be decomposed in this study, 
as the Daya Bay experiment did [10]. In the Huber-Mueller 
model case, o;(E,,) is denoted by o/™(E,). 

Thus, the predicted ”e spectrum can be denoted as the com- 
bination of o;(£,_) and the coefficient k;(E,) : 


= Dt E,,,t) -0;(Ey), 


where coefficient k;(E,,t) is the multiplication of a set of 
experimental parameters referring to Eq. (3): 
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Assuming the virtual experiment had run for nine fuel cy- 
cles (~4600 days), information on the reactor thermal power 
and antineutrino spectrum is collected weekly during the op- 
eration. As a result, a list of coefficients and De observations 
varying with time is provided (see the bottom panel of Fig. 


1). 
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HI. CONFIGURATIONS OF CONVOLUTIONAL NEURAL 


NETWORK 


Among the many methods of machine learning, the CNN 
is commonly applied to extract the shift-invariant features of 
data with its specialized convolutional layer. In the reactor 


TABLE 2. Virtual experiment dataset and convolutional operation. 


Sample Coefficient Expectation Observation 
ti Do kus, Se*(t,) 
ts oes (t2) 
tn Kiz Kig Kno kni ci (tn) 

Input Output Label 


antineutrino spectrum decomposition study, the isotope an- 
tineutrino spectra are time-invariant in the reactor evolution 
data. Thus, the CNN method might be a suitable approach 
for extracting the isotope antineutrino spectra. To extract the 
isotope spectra from the simulation dataset, we constructed a 
one-dimensional CNN. To explicitly describe the CNN model 
we constructed, before our introduction of the CNN, we in- 
troduce the data structures required by the CNN model, the 
operation performed on data, and some key concepts of the 
CNN, which are summarized in TABLE 2. The dataset of 
the virtual experiment is organized sample by sample that 
is tagged with time in TABLE 2, such as t1, t2, ..., tn, for 
each week. The CNN model splits the periodical experimen- 
tal measurement data (one week) to create a training sample. 
The ’Coefficient’ columns of TABLE 2 is the key input of the 
CNN, in which the coefficient ky is calculated using Eq. (7) 
from the virtual experiment for week ¢ and isotope 7, week 
by week. The central part of the CNN is the convolutional 
kernel, a small matrix for feature extraction, defined as (0235, 
0238, 0239, 7241), aS shown in the second row of TABLE 2, 
representing the respective isotope spectra in Eq. (5). A lin- 
ear operation, called convolution in a CNN, is performed on 
the convolutional kernel and input data to generate the output 
data in TABLE 2, as shown in the second row of the ’Expec- 
tation’ column. The output is the expected antineutrino spec- 
trum in Eq. (6). The convolutional operation is performed 
sample by sample across the entire dataset; in other words, 
the convolutional kernel (0235, 023g, 0239, 0241) in the table 
slides along the timeline and combines with each row of co- 
efficients to predict the De spectrum outcome. Such a process 
returns a list of calculation outputs ( Expectation’ column), 
which is compared with the label data, the De spectrum ob- 
served by the detector ( Observation’ column). Notably, en- 
tries in TABLE 2 focus on the same energy bin. In this study, 
the neutrino energy bins range from 2 to 8 MeV, and each of 
them covers a range of 0.25 MeV; thus, there are 24 energy 
bins. 


The CNN aims to learn from reactor antineutrino experi- 
mental data to fit the isotope spectra by updating its convo- 
lutional kernel. Because this study divides the energy range 
into 24 bins from 2 to 8 MeV, a corresponding number of 


convolutional kernels are employed. 

The architecture of the constructed CNN model is shown in 
Fig. 2. This CNN model comprises three layers: the convo- 
lutional layer, the flatten layer, and the fully connected layer. 
The convolutional layer is where most computations occur. 
This requires input data (rectangles on the left side of Fig. 
2) and convolutional kernels (the shaded patch on the bottom 
left). The input data are from the simulation coefficients, as 
shown in TABLE 2. For each energy bin, the coefficient table 
and the respective convolutional kernels perform the convolu- 
tional operation, and the outcomes, representing the expected 
Ve, are conveyed to the second layer (bars on the middle side 
and marked as feature maps). Next, the flattening operation 
is applied to transform the multidimensional data into one di- 
mension. Such a flattening operation is commonly used in 
the transition from the convolutional layer to the fully con- 
nected layer. The last layer (bars on the right side), the fully 
connected layer, outputs the flattened results as the expecta- 
tion of De. Later, the CNN compares the output values with 
the corresponding Pe label data and begins its training pro- 
cess via the so-called back-propagation method. The purpose 
of back-propagation is to make the output values as close as 
possible to the label values. During the training process, the 
CNN repeats back-propagation many times, and in this man- 
ner, the parameters of the convolutional kernel (0235, 0238, 
0239, 0241) are adjusted to their best fit values by iteration. 
Unlike the conventional neural networks, described as a black 
box model, this CNN model is interpretable, where the con- 
volutional kernel components carry the information of isotope 
spectra, the inputs corresponding to the convolutional kernel 
components represent the fission rates of the four isotopes, 
and the outputs simulate the predicted De spectra. 

Once the architecture of the CNN has been built, the next 
step is tuning the hyperparameters of the CNN model. Hy- 
perparameters are configurations used to control the training 
process, for example, the objective function, optimizer, and 
learning rate. Hyperparameters are usually set before data 
training; therefore, we should find their appropriate config- 
urations before our real decomposition work. This hyperpa- 
rameter tuning process is called pre-training to distinguish it 
from the subsequent training procedure of our real decom- 
position work, in which the hyperparameters are configured 
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Fig. 2. (Color online) Architecture of the one-dimensional convolutional neural network. It takes the coefficients of isotope spectra as 
inputs and performs the convolutional operation by sliding the convolutional kernels over the inputs to form the convolutional layer. The 
convolutional results (feature maps) are passed to the next layer (flatten layer) and converted from tensor values to scalar values. The last 
layer (fully connected layer) of the CNN outputs the expectations of antineutrino spectrum in the detector. 


properly. However, one of the most challenging limitations 
is that the hyperparameters cannot be estimated directly from 
the data and must be specified manually. Generally, there is 
no golden rule, and searching for the best hyperparameters is 
conducted by trial and error. 

During the pre-training process, the simulation dataset fed 
into the CNN is noiseless, and systematic uncertainties of pa- 
rameters in TABLE 1 are assumed to be zero. In other words, 
measurements of the virtual experimental parameters are re- 
garded as being sufficiently precise to suppress the noise ef- 
fects. Such efforts enable the CNN model to determine the 
most suitable hyperparameters. 

Our computation is conducted on a server cluster consist- 
ing of a group of computers with 16-core CPUs. The clus- 
ter provides support for up to 500 multi-core jobs for our 
study. Thus, we are able to decompose from 500 Monte Carlo 
datasets simultaneously [22]. The pre-training of the CNN 
is implemented in Keras 2.3, a user-friendly framework that 
provides a Python frontend for researchers, and Keras uses 
the TensorFlow platform as its backend. These two tools pro- 
vide sufficient standard modules for users to build and train 
the neural network models; thus, our coding is mainly based 
on the standard modules of the two packages. However, we 
need to develop a new objective function prototype for our 
study, which we will explain in detail later. With this clus- 
ter and the two packages, our computation process requires 
~300 Mbytes of memory and ~5 hours for each decomposi- 
tion task. 

For decomposing the individual isotope spectra from the 
data, the CNN requires an objective function to optimize the 


neural network parameter o; by reducing the difference be- 
tween the output result and the label data. For general regres- 
sion problems of a CNN, the mean squared error (MSE) is 
the conventional choice, in which no uncertainties are con- 
sidered. However, in this study, an objective function in the 
form of the x? function is constructed by considering the sta- 
tistical uncertainty and the uncertainties introduced by ?38U 
and 24!Py, commonly used in high energy physics analysis. 
The x? function is defined as 


1(8,,0) => Se = FN 
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a (7938 (Ev) = 538 (E,))? 
(838 (En) x 15%)? 


(O41 (Ev) - o3 (Ev) )? 
(o3M(E,) x 10%) 
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where j is the sample index, and S9 (E) is the observed 
De spectrum of the j-th sample and assumed to be a Gaussian 
distributed variable. S% (E,) is the expected 7 spectrum 
of the j-th sample, which is calculated by the CNN using the 
convolutional operation as follows: 


SPP (Ey) =D ki(Ev ty) oi(E,). (9) 


The first term of Eq. (8) is a likelihood function that mea- 
sures the distance from the predicted Je value to its labeled 
observation value. As aforementioned, the CNN reduces the 
difference by iteratively updating its network parameters. The 


other parts of Eq. (8) are the penalty terms that allow the CNN 
to use a priori constraints on 093g and o241 With their uncer- 
tainties. Because the fission fractions of 73°U and °*^ Pu are 
small and the fuel evolution is not sensitive to the two iso- 
topes, they are treated as penalty terms. Using the Huber- 
Mueller model as their priors, the shape uncertainties of ?38U 
and 741 Py are assigned values of 15% and 10%, respectively. 

During training, the neural network uses an iterative algo- 
rithm (called optimizer) to minimize the objective function 
and adjust its internal network parameters. In this study, the 
CNN implements the adaptive moment estimation (Adam) 
method as its optimizer, which facilitates the computation of 
learning rates by using the first and second moments of the 
gradient [23, 24]. 

Initially, the CNN parameter ø; is as follows: 


1 
oi(Ey) = 5 Sof (EL), (l= 235, 238, 239, 241). 


(10) 


Sometimes, the starting point of the parameter is crucial for 
the training result because the optimizer of the neural network 
is susceptible to finding the local optima solution and becom- 
ing stuck with some of these points. Hence, to examine the 
influence of different initial parameter values, a 50% uncer- 
tainty is assigned to g; in Eq. (10) as the initialization test, 
and the results are almost identical. This shows that the CNN 
model is not sensitive to the parameter initialization schemes 
in this study. 

Based on the objective function and optimizer, the neu- 
ral network follows the specified algorithm to iteratively up- 
date its parameters. Controlling the speed of parameter value 
change (called learning rate) is important because a learning 
rate that is too large might cause the model to converge too 
quickly to a local optima solution, whereas a rate that is too 
small would result in the process being stuck. In this study, 
the learning rates of the CNN parameters follow the schedule 
shown in the top panel of Fig. 3, where the learning rates 
appear to be functions of the epoch. High energy parameters 
are configured with a smaller learning rate than those of low 
energy, mainly because isotope spectra exhibit minor values 
at high energy; and therefore, the CNN requires increased ac- 
curacy in control in these areas. 

An epoch refers to training the neural network with all 
training data for one cycle. It consists of one or more batches, 
where a part of the dataset is used as the input. The number 
of samples in a batch is called batch size. In this study, the 
batch size is set to four samples, hence, four weeks of data 
are passed into the CNN between each iteration of the param- 
eters. 

When the CNN prepares to train the data, the number of 
training cycles (called epochs) should be set before the train- 
ing starts. However, determining the exact optimal number of 
epochs for the model is difficult. Depending on the network 
model and the various datasets, we must determine when the 
parameters are converged and when the CNN model should 
stop its training process. Regarding machine learning, on the 
one hand, we might have the over-fitting problem, where the 
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Fig. 3. (Color online) (Top panel) Learning rates schedule of the 
CNN. All learning rates are divided into two groups, among which 
those of the low energy region (< 6 MeV) decay slower with the 
epoch than those of the high energy region (> 6 MeV). (Bottom 
panel) The superposition of the decomposed results from thousands 
of training. With the increasing epoch, the verification factor grad- 
ually approaches 100%. After the epoch exceeds 1500, the decom- 
posed results tend to be steady. 


neural network model fits perfectly to the training data but has 
poor generalization performance to new data, usually caused 
by an excess number of training cycles. On the other hand, 
we might have the under-fitting problem if the model does not 
sufficiently learn the data, usually due to a low epoch num- 
ber. In determining whether a neural network model has con- 
verged, the common practice is to examine the variation in 
the training results with epochs. If the number of epochs is 
set too low, the training process terminates before the model 
converges. By contrast, if the number of epochs is set too 
high, the model is probably over-fitting. Thus, the number of 
epochs should be considered. 

For evaluating and visualizing the effectiveness of the CNN 
decomposition method, a verification factor is defined as 


o;(E,) 


ratio i (E) = o™(E,) 


x 100%, (11) 


which is the ratio between the predicted isotope spectrum and 
the truth spectrum. 

In this study, we evaluate the influence from the configura- 
tion of different epochs, by conducting thousands of training 
processes and superposing their results in one plot, as shown 
in the bottom panel of Fig. 3, where the X-axis represents 
the training cycle number, the Y -axis represents the verifica- 
tion factor, and the color of the data represents the frequency 
of the training results. When the epoch reaches the number 
of ~1500, the verification factor stably converges to nearly 


100%. Conservatively, the number of epochs is set to 2000 
cycles. 

After the hyperparameters have been determined, we com- 
plete the pre-training process and establish the entire CNN 
model. Maintaining the same configurations, we prepare to 
test the decomposition performance of the CNN with the ex- 
perimental data. In this study, the simulation data are used 
instead. 


IV. RESULTS OF DECOMPOSITION 


Using the aforementioned hyperparameter configurations, 
the CNN decomposes the individual isotope spectra from 
both noiseless and noisy simulation datasets. In this study, 
we mainly examine the unbiasedness and uncertainties of the 
decomposition results by using the CNN method. 

Using noiseless datasets, in which both systematic uncer- 
tainties and statistical error are ignored, the decomposition is 
implemented 1000 times, and the extracted spectra samples 
are compared with the truth values to evaluate the bias and un- 
certainties. As shown in Fig. 4, the ratios of the mean values 
of the extracted spectra samples to the truth spectra are pre- 
sented as data points; and the deviations are less than 0.1%, 
which can be ignored; and the decomposed isotope spectra 
can be regarded as unbiased. The tiny error bars represent the 
uncertainties introduced by the CNN model, and they are ob- 
tained by calculating the standard deviations of the ratios of 
the extracted spectra samples to the truth spectra. 
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Fig. 4. (Color online) Verification of the unbiasedness of the CNN 
method. The data point shows the ratio of the decomposed isotope 
spectrum to the truth spectrum. The error bar in the data point repre- 
sents the decomposition uncertainty. For reasons of contrast, three 
of the curves are shifted down. Originally all curves are centered at 
100%. 


When considering the noise effects, the statistical error and 
the systematic uncertainties are assigned to the experimen- 
tal measurements, by applying the Poisson fluctuation and 
the systematic uncertainties in TABLE 1, respectively. One 
thousand different noisy datasets are generated with these un- 
certainties, from which the individual isotope spectra are ex- 
tracted, and the decomposition results vary under the noise 
disturbance. The mean value and the standard deviation of 
the whole decomposition results are shown in Fig. 5. 

Because ?38U and ?4Pu spectrum are treated as prior 
knowledge, this study presents the decomposition results of 
2350 and ?39Pu, whose fitting is principally driven by the 
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Fig. 5. (Color online) (Top panel) The decomposed 73° U and ?39 Pu 
spectrum. The error bar in the data point is the square root of the 
diagonal term of the decomposed spectrum covariance matrix. (Bot- 
tom panel) Ratio of the decomposed spectrum to the truth spectrum. 
The data point of 7°°U is displaced for visual clarity of error bars. 


simulation experimental data. As shown in the bottom panel 
of Fig. 5, decomposition results of both isotopes deviate from 
the truth spectrum by less than 0.3%, which is practically un- 
biased. The decomposed ?35U spectrum has a smaller uncer- 
tainty than the 7°°Pu spectrum, mainly because 7°°U is the 
primary contributor of reactor De, and it provides the largest 
number of antineutrino events. 


V. CONCLUSION AND DISCUSSION 


In summary, we propose a machine learning approach to 
decompose 7%°U and 73°Pu isotope antineutrino spectrum 
from the evolution data of a simulated reactor antineutrino 
experiment. The CNN decomposition method is applied to 
noiseless and noisy datasets considering the main uncertain- 
ties of a reactor antineutrino experiment, and the validation 
tests show that the deviations of the decomposed spectra are 
less than 0.1% and 0.3%, respectively, and thus, could be 
viewed as unbiased. The uncertainty introduced by the CNN 
method is less than 0.1%, and the statistical and systematic 
uncertainties can be evaluated using the Monte Carlo method. 

The CNN decomposition method is applicable to realistic 
commercial reactor antineutrino experiments as well because 
the physical principles of Pe emission and detection in these 
reactor antineutrino experiments are almost the same as those 
in the virtual experiment designed in this study. Unlike the 
virtual experiment, realistic experiments commonly employ 
multiple reactors and detectors; thus, the coefficient k;(E,,, t) 
formula defined in Eq. (7) should be replaced by the effective 
coefficients for different reactors. The effective coefficient is 


calculated as 


kia( EL, t) = 


Na * €d : 5y Wr (t) z fir(t) : Peur (Ev; Lra) 
An La Dy firt) e l 
(12) 


r 


where the subscript d is the detector index, r is the reactor 
index, E, is the De energy, Na is the target proton number, 
cq is the detection efficiency, Lya is the distance from reactor 
r to detector d, Psur(E,, Lra) is the De survival probability, 
Wz, (t) is the thermal power of reactor r, e; is the energy re- 
leased per fission for isotope l, and fır is the fission fraction 
of reactor r for isotope l. This is simply the summation of the 
coefficient contributions from individual reactors. 


Due to the various experimental operation times and base- 
line scales ranging from ~10 m to ~1000 m, the number 
of the observed Pe over a period and an experiment could 
be very different. Thus, we could merge the periodic mea- 
surement data, and rearrange them into new groups, to ensure 
the antineutrino event rate of a sample on the same scale as 
this study and to guarantee the validity of the x? objective 
function. Implementing such efforts would make the CNN 
method applicable to realistic experimental cases. 


In addition, the decomposition in this study is applied di- 
rectly to the antineutrino spectrum. However, in realistic re- 
actor antineutrino experiments, the energy spectrum of De is 
detected and converted via the visible prompt energy. The 
prompt energy is related to the antineutrino energy as follows: 


E, = Ep, — 0.78 MeV. (13) 
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Therefore, before the step of decomposition with the CNN 
method, we transfer the measured prompt spectrum to the De 
spectrum (commonly called unfolding), which, in principle, 
can be integrated into the layers of the CNN. We plan to 
append extra neural network layers to the established CNN 
model in our future studies to accomplish the unfolding anal- 
ysis. 

In the near future, very short-baseline reactor antineutrino 
experiments are expected to measure the reactor antineu- 
trino spectrum with higher precision and energy resolution. 
The promising decomposition approach introduced and well 
demonstrated in this paper could be applied in these experi- 
ments to provide the most up-to-date individual isotope an- 
tineutrino spectra. 
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